Performance of Various Machine Learning Classifiers on Small Datasets with Varying Dimensionalities: A Study
نویسندگان
چکیده
Classification is an important supervised learning technique that is used by many applications. An important factor on which the performance of a classifier depends is the size of the dataset using which the classifier is going to be trained. In this manuscript the authors have analyzed five different classification techniques (namely decision trees, KNN, SVM, linear discriminant and Ensemble method) in terms of AUC and predictive accuracy when trained using small datasets with different dimensionalities. The study has been carried out using a dataset with 24 features and 400 instances (samples). The results show that, in general ensemble method (using boosted trees) performed better than others but its performance degraded a bit with reduced dimensionality.
منابع مشابه
A Pre-Trained Ensemble Model for Breast Cancer Grade Detection Based on Small Datasets
Background and Purpose: Nowadays, breast cancer is reported as one of the most common cancers amongst women. Early detection of the cancer type is essential to aid in informing subsequent treatments. The newest proposed breast cancer detectors are based on deep learning. Most of these works focus on large-datasets and are not developed for small datasets. Although the large datasets might lead ...
متن کاملApplication of ensemble learning techniques to model the atmospheric concentration of SO2
In view of pollution prediction modeling, the study adopts homogenous (random forest, bagging, and additive regression) and heterogeneous (voting) ensemble classifiers to predict the atmospheric concentration of Sulphur dioxide. For model validation, results were compared against widely known single base classifiers such as support vector machine, multilayer perceptron, linear regression and re...
متن کاملEnhancing Learning from Imbalanced Classes via Data Preprocessing: A Data-Driven Application in Metabolomics Data Mining
This paper presents a data mining application in metabolomics. It aims at building an enhanced machine learning classifier that can be used for diagnosing cachexia syndrome and identifying its involved biomarkers. To achieve this goal, a data-driven analysis is carried out using a public dataset consisting of 1H-NMR metabolite profile. This dataset suffers from the problem of imbalanced classes...
متن کاملAutomated Detection of Multiple Sclerosis Lesions Using Texture-based Features and a Hybrid Classifier
Background: Multiple Sclerosis (MS) is the most frequent non-traumatic neurological disease capable of causing disability in young adults. Detection of MS lesions with magnetic resonance imaging (MRI) is the most common technique. However, manual interpretation of vast amounts of data is often tedious and error-prone. Furthermore, changes in lesions are often subtle and extremely unrepresentati...
متن کاملA Classification Method for E-mail Spam Using a Hybrid Approach for Feature Selection Optimization
Spam is an unwanted email that is harmful to communications around the world. Spam leads to a growing problem in a personal email, so it would be essential to detect it. Machine learning is very useful to solve this problem as it shows good results in order to learn all the requisite patterns for classification due to its adaptive existence. Nonetheless, in spam detection, there are a large num...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016